Search CORE

80 research outputs found

npInv: accurate detection and genotyping of inversions using long read sub-alignment

Author: Cao Minh Duc
Coin Lachlan J. M.
Duarte Tania
Ganesamoorthy Devika
Hoggart Clive J.
Shao Haojing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

BACKGROUND: Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. RESULT: We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats. CONCLUSION: The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion

Directory of Open Access Journals

University of Queensland eSpace

Fregene: Simulation of realistic sequence-level data in populations and ascertained samples

Author: AG Clark
B Peng
BS Weir
CJ Hoggart
CJ Hoggart
Clive J Hoggart
David J Balding
DJ Balding
E Setakis
I Tachmazidou
J Hey
JL Davies
John C Whittaker
Marc Chadeau-Hyam
Maria De Iorio
MJ Minichiello
Paul F O'Reilly
S Schaffner
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets.Results: We report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection.Conclusion: FREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended

Crossref

LSHTM Research Online

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

King's Research Portal

University of Melbourne Institutional Repository

Evaluation of Host Serum Protein Biomarkers of Tuberculosis in sub-Saharan Africa.

Author: Banda Louis
Chegou Novel N
Crampin Amelia C
Dockrell Hazel M
French Neil
Goliath Rene
Hamilton Melissa S
Hoggart Clive J
Kidd Martin
Levin Michael
Morris Thomas C
Oni Tolu
Sichali Lifted
Walzl Gerhard
Wilkinson Katalin A
Wilkinson Robert J
Publication venue: Front Immunol
Publication date: 01/01/2021
Field of study

Accurate and affordable point-of-care diagnostics for tuberculosis (TB) are needed. Host serum protein signatures have been derived for use in primary care settings, however validation of these in secondary care settings is lacking. We evaluated serum protein biomarkers discovered in primary care cohorts from Africa reapplied to patients from secondary care. In this nested case-control study, concentrations of 22 proteins were quantified in sera from 292 patients from Malawi and South Africa who presented predominantly to secondary care. Recruitment was based upon intention of local clinicians to test for TB. The case definition for TB was culture positivity for Mycobacterium tuberculosis; and for other diseases (OD) a confirmed alternative diagnosis. Equal numbers of TB and OD patients were selected. Within each group, there were equal numbers with and without HIV and from each site. Patients were split into training and test sets for biosignature discovery. A nine-protein signature to distinguish TB from OD was discovered comprising fibrinogen, alpha-2-macroglobulin, CRP, MMP-9, transthyretin, complement factor H, IFN-gamma, IP-10, and TNF-alpha. This signature had an area under the receiver operating characteristic curve in the training set of 90% (95% CI 86-95%), and, after adjusting the cut-off for increased sensitivity, a sensitivity and specificity in the test set of 92% (95% CI 80-98%) and 71% (95% CI 56-84%), respectively. The best single biomarker was complement factor H [area under the receiver operating characteristic curve 70% (95% CI 64-76%)]. Biosignatures consisting of host serum proteins may function as point-of-care screening tests for TB in African hospitals. Complement factor H is identified as a new biomarker for such signatures

University of Liverpool Repository

LSHTM Research Online

Apollo (Cambridge)

Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

Author: A Genkin
AC Morrison
B Servin
CC Holmes
CJ Hoggart
Clive J. Hoggart
David J. Balding
DJ Lunn
EI George
EI George
I Gradshteyn
IP Gorlov
JE Griffin
John C. Whittaker
L Breiman
M Bazaraa
M West
Maria De Iorio
MR Osborne
N Patterson
NR Wray
PD Sasieni
Peter M. Visscher
PJ Brown
R Sladek
R Tibshirani
S Zhang
SF Schaffner
TH Meuwissen
TJ Mitchel
Y Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation

CiteSeerX

Public Library of Science (PLOS)

Crossref

LSHTM Research Online

Directory of Open Access Journals

PubMed Central

UCL Discovery

University of Melbourne Institutional Repository

Natural and Orthogonal Interaction framework for modeling gene-environment interactions with application to lung cancer

Author: Amos Christopher I.
Andrew Angeline S.
Brenner Hermann
Duell Eric J.
Haugen Aage
Hoggart Clive
Hung Rayjean J.
Lazarus Philip
Liu Changlu
Ma Jianzhong
Matsuo Keitaro
Mayordomo Jose Ignacio
Schwartz Ann G.
Staratschek-Jox Andrea
Wichmann H.-Erich
Xiao Feifei
Xiong Momiao
Yang Ping
Publication venue: 'S. Karger AG'
Publication date: 24/07/2018
Field of study

Objectives: We aimed at extending the Natural and Orthogonal Interaction (NOIA) framework, developed for modeling gene-gene interactions in the analysis of quantitative traits, to allow for reduced genetic models, dichotomous traits, and gene-environment interactions. We evaluate the performance of the NOIA statistical models using simulated data and lung cancer data. Methods: The NOIA statistical models are developed for additive, dominant, and recessive genetic models as well as for a binary environmental exposure. Using the Kronecker product rule, a NOIA statistical model is built to model gene-environment interactions. By treating the genotypic values as the logarithm of odds, the NOIA statistical models are extended to the analysis of case-control data. Results: Our simulations showed that power for testing associations while allowing for interaction using the NOIA statistical model is much higher than using functional models for most of the scenarios we simulated. When applied to lung cancer data, much smaller p values were obtained using the NOIA statistical model for either the main effects or the SNP-smoking interactions for some of the SNPs tested. Conclusion: The NOIA statistical models are usually more powerful than the functional models in detecting main effects and interaction effects for both quantitative traits and binary traits. Copyright (C) 2012 S. Karger AG, Base

Diposit Digital de la Universitat de Barcelona

Diagnostic Test Accuracy of a 2-Transcript Host RNA Signature for Discriminating Bacterial vs Viral Infection in Febrile Children.

Author: Barendregt Anouk M
Burns Jane C
Carter Michael J
Cebey-López Miriam
Coin Lachlan JM
Eleftherohorinou Hariklia
Faust Saul N
Gormley Stuart
Herberg Jethro A
Hoggart Clive J
IRIS Consortium
Janes Victoria A
Kaforou Myrsini
Kanegaye John
Kuijpers Taco
Levin Michael
Martinón-Torres Federico
Patel Sanjay
Pollard Andrew J
Salas Antonio
Shailes Hannah
Shimizu Chisato
Tremoulet Adriana H
Wright Victoria J
Publication venue: 'American Medical Association (AMA)'
Publication date: 01/01/2016
Field of study

IMPORTANCE: Because clinical features do not reliably distinguish bacterial from viral infection, many children worldwide receive unnecessary antibiotic treatment, while bacterial infection is missed in others. OBJECTIVE: To identify a blood RNA expression signature that distinguishes bacterial from viral infection in febrile children. DESIGN, SETTING, AND PARTICIPANTS: Febrile children presenting to participating hospitals in the United Kingdom, Spain, the Netherlands, and the United States between 2009-2013 were prospectively recruited, comprising a discovery group and validation group. Each group was classified after microbiological investigation as having definite bacterial infection, definite viral infection, or indeterminate infection. RNA expression signatures distinguishing definite bacterial from viral infection were identified in the discovery group and diagnostic performance assessed in the validation group. Additional validation was undertaken in separate studies of children with meningococcal disease (n = 24) and inflammatory diseases (n = 48) and on published gene expression datasets. EXPOSURES: A 2-transcript RNA expression signature distinguishing bacterial infection from viral infection was evaluated against clinical and microbiological diagnosis. MAIN OUTCOMES AND MEASURES: Definite bacterial and viral infection was confirmed by culture or molecular detection of the pathogens. Performance of the RNA signature was evaluated in the definite bacterial and viral group and in the indeterminate infection group. RESULTS: The discovery group of 240 children (median age, 19 months; 62% male) included 52 with definite bacterial infection, of whom 36 (69%) required intensive care, and 92 with definite viral infection, of whom 32 (35%) required intensive care. Ninety-six children had indeterminate infection. Analysis of RNA expression data identified a 38-transcript signature distinguishing bacterial from viral infection. A smaller (2-transcript) signature (FAM89A and IFI44L) was identified by removing highly correlated transcripts. When this 2-transcript signature was implemented as a disease risk score in the validation group (130 children, with 23 definite bacterial, 28 definite viral, and 79 indeterminate infections; median age, 17 months; 57% male), all 23 patients with microbiologically confirmed definite bacterial infection were classified as bacterial (sensitivity, 100% [95% CI, 100%-100%]) and 27 of 28 patients with definite viral infection were classified as viral (specificity, 96.4% [95% CI, 89.3%-100%]). When applied to additional validation datasets from patients with meningococcal and inflammatory diseases, bacterial infection was identified with a sensitivity of 91.7% (95% CI, 79.2%-100%) and 90.0% (95% CI, 70.0%-100%), respectively, and with specificity of 96.0% (95% CI, 88.0%-100%) and 95.8% (95% CI, 89.6%-100%). Of the children in the indeterminate groups, 46.3% (63/136) were classified as having bacterial infection, although 94.9% (129/136) received antibiotic treatment. CONCLUSIONS AND RELEVANCE: This study provides preliminary data regarding test accuracy of a 2-transcript host RNA signature discriminating bacterial from viral infection in febrile children. Further studies are needed in diverse groups of patients to assess accuracy and clinical utility of this test in different clinical settings

LSHTM Research Online

Oxford University Research Archive

University of Queensland eSpace

Genome-wide association study of primary tooth eruption identifies pleiotropic loci associated with height and craniofacial distances

Author: Alexei I. Zhurov
Anja Taanila
Anneli Pouta
Arshed M. Toma
Bakrania
Bayés de Luna
Beate St Pourcain
Bei
Bjarke Feenstra
Clive J. Hoggart
Cole
Cornelis
David M. Evans
Demetris Pillas
Di Cello
Eleftherohorinou
Eleftherohorinou
Elks
Ellen A. Nohr
Frank Geller
Freedman
Fujiwara
Geller
George Davey Smith
Ghazaleh Fatemifar
Golding
Goyal
Gu
Gudbjartsson
Hammarsund
Haraguchi
Hikake
Holmans
Horikoshi
Houlston
Hughes
Inga Prokopenko
Jernvall
Jia
John P. Kemp
Jon H. Tobias
Jussila
Kang
Kere
Kirsi Sipila
Lango Allen
Lavinia Paternoster
Levy
Li
Ligon
Mads Melbye
Marjo-Riitta Jarvelin
Meindl
Mikkola
Momoko Horikoshi
Mustonen
Nicholas J. Timpson
Nohr
Nunes
Nyholt
Paternoster
Pillas
Pispa
Pruim
Raija Lähdesmäki
Sabatti
Sakuraba
Simón-Sánchez
Soranzo
Srivastava
Stein
Stephen Richmond
Susan M. Ring
Taal
Taviaux
Tummers
Vainio
Vaz
Victoria J. Wright
Weedon
Willer
Wise
Yabe
Yang
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Twin and family studies indicate that the timing of primary tooth eruption is highly heritable, with estimates typically exceeding 80%. To identify variants involved in primary tooth eruption we performed a population based genome-wide association study of ‘age at first tooth’ and ‘number of teeth’ using 5998 and 6609 individuals respectively from the Avon Longitudinal Study of Parents and Children (ALSPAC) and 5403 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966). We tested 2,446,724 SNPs imputed in both studies. Analyses were controlled for the effect of gestational age, sex and age of measurement. Results from the two studies were combined using fixed effects inverse variance meta-analysis. We identified a total of fifteen independent loci, with ten loci reaching genome-wide significance (p<5x10−8) for ‘age at first tooth’ and eleven loci for ‘number of teeth’. Together these associations explain 6.06% of the variation in ‘age of first tooth’ and 4.76% of the variation in ‘number of teeth’. The identified loci included eight previously unidentified loci, some containing genes known to play a role in tooth and other developmental pathways, including a SNP in the protein-coding region of BMP4 (rs17563, P= 9.080x10−17). Three of these loci, containing the genes HMGA2, AJUBA and ADK, also showed evidence of association with craniofacial distances, particularly those indexing facial width. Our results suggest that the genome-wide association approach is a powerful strategy for detecting variants involved in tooth eruption, and potentially craniofacial growth and more generally organ development

Crossref

Online Research @ Cardiff

OPUS - University of Technology Sydney

UCL Discovery

PubMed Central

Copenhagen University Research Information System

Spiral - Imperial College Digital Repository

MPG.PuRe

Explore Bristol Research

MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS

Author: C Cotsapas
C Gieger
Clive J. Hoggart
DR Nyholt
EK Speliotes
Federico C. F. Calboli
G Thorleifsson
GB Ehret
J Marchini
KS Small
L Klei
LA Hindorff
Lachlan J. M. Coin
MA Ferreira
Marjo-Riitta Jarvelin
N Sattar
Paul Elliott
Paul F. O’Reilly
PR Burton
Q Yang
R Sladek
S Kim
SA Pendergrass
SE Medland
Stacey Cherny
T Illig
TM Teslovich
TY Wong
WT Friedewald
Yotsawat Pomyen
Z Šidák
Publication venue: Public Library of Science
Publication date: 08/03/2012
Field of study

The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes

Public Library of Science (PLOS)

Crossref

Julkari

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

King's Research Portal

University of Melbourne Institutional Repository

University of Queensland eSpace

FigShare

Identification of novel locus associated with coronary artery aneurysms and validation of loci for susceptibility to Kawasaki disease

Author: Bellos Evan
Brogan Paul
Burgner David
Burns Jane C
Choi Shing Wan
Galassini Rachel
Herberg Jethro A
Hibberd Martin
Hoggart Clive
Kim Jihoon
Kuijpers Taco
Levin Michael
Menikou Stephanie
O'Connor Daniel
Patel Harsita
Pollard Andrew J
Sallah Neneh
Salo Eeva
Seaby Eleanor G
Shailes Hannah
Shimizu Chisato
Tremoulet Adriana H
van Stijn Diana
Wright Victoria J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Kawasaki disease (KD) is a paediatric vasculitis associated with coronary artery aneurysms (CAA). Genetic variants influencing susceptibility to KD have been previously identified, but no risk alleles have been validated that influence CAA formation. We conducted a genome-wide association study (GWAS) for CAA in KD patients of European descent with 200 cases and 276 controls. A second GWAS for susceptibility pooled KD cases with healthy paediatric controls from vaccine trials in the UK (n = 1609). Logistic regression mixed models were used for both GWASs. The susceptibility GWAS was meta-analysed with 400 KD cases and 6101 controls from a previous European GWAS, these results were further meta-analysed with Japanese GWASs at two putative loci. The CAA GWAS identified an intergenic region of chromosome 20q13 with multiple SNVs showing genome-wide significance. The risk allele of the most associated SNV (rs6017006) was present in 13% of cases and 4% of controls; in East Asian 1000 Genomes data, the allele was absent or rare. Susceptibility GWAS with meta-analysis with previously published European data identified two previously associated loci (ITPKC and FCGR2A). Further meta-analysis with Japanese GWAS summary data from the CASP3 and FAM167A genomic regions validated these loci in Europeans showing consistent effects of the top SNVs in both populations. We identified a novel locus for CAA in KD patients of European descent. The results suggest that different genes determine susceptibility to KD and development of CAA and future work should focus on the function of the intergenic region on chromosome 20q13

UTUPub

Diagnosis of Kawasaki Disease Using a Minimal Whole-Blood Gene Expression Signature.

Author: Barendregt Anouk M
Berk Maurice
Burns Jane C
Coin Lachlan JM
Eleftherohorinou Hariklia
Glodé Mary P
Gormley Stuart
Herberg Jethro A
Hibberd Martin
Hoang Long Truong
Hoggart Clive J
Immunopathology of Respiratory Inflammatory and Infectious Dise
Kaforou Myrsini
Kanegaye John T
Kuijpers Taco W
Levin Michael
Menikou Stephanie
Shailes Hannah
Shimizu Chisato
Tremoulet Adriana H
Wright Victoria J
Publication venue: 'American Medical Association (AMA)'
Publication date: 01/01/2018
Field of study

Importance: To date, there is no diagnostic test for Kawasaki disease (KD). Diagnosis is based on clinical features shared with other febrile conditions, frequently resulting in delayed or missed treatment and an increased risk of coronary artery aneurysms. Objective: To identify a whole-blood gene expression signature that distinguishes children with KD in the first week of illness from other febrile conditions. Design, Setting, and Participants: The case-control study comprised a discovery group that included a training and test set and a validation group of children with KD or comparator febrile illness. The setting was pediatric centers in the United Kingdom, Spain, the Netherlands, and the United States. The training and test discovery group comprised 404 children with infectious and inflammatory conditions (78 KD, 84 other inflammatory diseases, and 242 bacterial or viral infections) and 55 healthy controls. The independent validation group comprised 102 patients with KD, including 72 in the first 7 days of illness, and 130 febrile controls. The study dates were March 1, 2009, to November 14, 2013, and data analysis took place from January 1, 2015, to December 31, 2017. Main Outcomes and Measures: Whole-blood gene expression was evaluated using microarrays, and minimal transcript sets distinguishing KD were identified using a novel variable selection method (parallel regularized regression model search). The ability of transcript signatures (implemented as disease risk scores) to discriminate KD cases from controls was assessed by area under the curve (AUC), sensitivity, and specificity at the optimal cut point according to the Youden index. Results: Among 404 patients in the discovery set, there were 78 with KD (median age, 27 months; 55.1% male) and 326 febrile controls (median age, 37 months; 56.4% male). Among 202 patients in the validation set, there were 72 with KD (median age, 34 months; 62.5% male) and 130 febrile controls (median age, 17 months; 56.9% male). A 13-transcript signature identified in the discovery training set distinguished KD from other infectious and inflammatory conditions in the discovery test set, with AUC of 96.2% (95% CI, 92.5%-99.9%), sensitivity of 81.7% (95% CI, 60.0%-94.8%), and specificity of 92.1% (95% CI, 84.0%-97.0%). In the validation set, the signature distinguished KD from febrile controls, with AUC of 94.6% (95% CI, 91.3%-98.0%), sensitivity of 85.9% (95% CI, 76.8%-92.6%), and specificity of 89.1% (95% CI, 83.0%-93.7%). The signature was applied to clinically defined categories of definite, highly probable, and possible KD, resulting in AUCs of 98.1% (95% CI, 94.5%-100%), 96.3% (95% CI, 93.3%-99.4%), and 70.0% (95% CI, 53.4%-86.6%), respectively, mirroring certainty of clinical diagnosis. Conclusions and Relevance: In this study, a 13-transcript blood gene expression signature distinguished KD from other febrile conditions. Diagnostic accuracy increased with certainty of clinical diagnosis. A test incorporating the 13-transcript disease risk score may enable earlier diagnosis and treatment of KD and reduce inappropriate treatment in those with other diagnoses

LSHTM Research Online

eScholarship - University of California

University of Queensland eSpace